<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>Search Results </title>

     
     
        <!-- Framework CSS -->
    <link rel="stylesheet" href="/site_media/blueprint-css/screen.css" type="text/css" media="screen, projection, print">
    <!-- link rel="stylesheet" href="../../blueprint/print.css" type="text/css" media="print" -->
    <!--[if lt IE 8]><link rel="stylesheet" href="../../blueprint/ie.css" type="text/css" media="screen, projection"><![endif]-->

    <!-- Import fancy-type plugin for the sample page. -->
    <!-- link rel="stylesheet" href="../../blueprint/plugins/fancy-type/screen.css" type="text/css" media="screen, projection" -->
    <link rel="stylesheet" href="/site_media/blueprint-css/plugins.buttons.screen.css" type="text/css" media="screen, projection">
<style type="text/css">

A               { color: #336699; text-decoration: underline; }
A:link          { color: #000080; text-decoration: underline; }
A:visited       { color: #336699; text-decoration: underline; }
A:active        { color: #9966FF; text-decoration: underline; }
A:hover         { color: #CC6633; text-decoration: none; }


input.text, input.title {
margin-bottom: 0px; 
}

/* .results div { padding:0 0 1.38em; } */
.results div.notice { padding:0.8em; text-align: center; }
.button { font-size: 1.5em; height: 1.8em; background-color: #a5d4eb; }
.next { background-color: #a5d4eb; }
select { background-color: #a5d4eb; }
/* input { background-color: #a5d4eb; border: 1px solid #eee; } */
a.close { text-decoration: none; position:relative; left: 4px; }
.facet { border:2px solid #DDDDDD; padding: .5em; margin-left: 3em; height: 1.8em; top: .5em; }
.facet a.close { vertical-align:super; text-align: right; }
.wrca-logo {  position:relative; left: 9px; }
h1 a { text-decoration:none; }

.resultCount{ color: #777; }


  .clear { text-decoration: none;  position: relative; left: -2.2em; visibility: hidden; 
padding: 1em; font-size: 1.5em; }
  .clear:link { color: #777; text-decoration: none; } 
 /* input#submit { position:relative; left: -2em; } */
</style>


  </head>
  <body >
    <div class="container">
<div class="header">



<div class="span-23 last">
<h1 style="">DPLA Vertical Search Demo</h1>
</div>

<form action="/search/" class="span-24">
    <input id="query" type="text" class="title" name="q" value="">
<!-- a class="clear" id="clearQuery" href="#" title="Clear">×</a -->
              <input id="submit" type="submit" class="button" value="search"> 

<div class="resultCount">find results</div>

</form>

<hr/>
</div>
      <div class="results span-16 colborder">
<!-- 
<div class="dym">Did you mean <a href="">loren gypsum</a>?</div>
-->

                  

                

<p style="line-height: 24px; font-size: 125%; ">Welcome to the Digital Public Library of America Vertical Search Demo, created for the <a href="http://blogs.law.harvard.edu/dpla/">DPLA Beta Sprint</a> by 
the <a href="http://www.cdlib.org/">California Digital Library</a>.
</p>

                  
<h2>Why explore a web crawler approach?</h2>
<ul>
<li>No need to rationalize and ingest metadata</li>
<li>Leverages curation expertise of libraries to focus enterprise search technology</li>
<li>Builds on existing technology we can easily deploy</li>
<!-- li>Improving search engine optimization will also enhance discovery on the open web</li -->


</ul>
 
<h2>What technology does it use?</h2>
   <p>This demo is built using <a href="http://nutch.apache.org/">Apache Nutch 1.3</a> for web crawling and
<a href="http://www.lucidimagination.com/products/certified/solr">LucidWorks Certified Distribution for Solr Release 3.2</a> for search with a simple
web interface using django templates.  
The <a href="https://ccp.cloudera.com/display/CDH2DOC/Configuring%20and%20Running%20CDH%20Cloud%20Scripts">Cloudera Distribution of Hadoop</a> was run on <a 
href="http://aws.amazon.com/ec2/">Amazon EC2</a> to create the index for the demo. <a href="http://code.google.com/p/public-digital-collection/">Code and configuration files</a> are available on Google 
Code.</p>

<p>The goal of this demo was to focus on the configuration of the crawl and search, with minimal programming work. There are a number of ways the functionality could be improved and the content expanded 
upon to provide a more robust searching experience (see below for just a few development ideas for future phases of the project).</p>


<h2>What content has been targeted?</h2>
<p>The demo targets a range of websites with digital cultural heritage content. Currently, approximately 300,000 unique URLs from 100 sources are included in the index. The <a 
href="http://imlsdcc.grainger.uiuc.edu/">IMLS Digital Collections
and Content</a>--a registry of digital materials funded by IMLS National Leadership Grants and selected LSTA-supported collections--provides the foundation for the demo index. In addition, resources 
from the University of
California, other libraries and cultural heritage institutions, and aggregated content websites have been included in the initial crawl. The full seed list is maintained in <a 
href="http://code.google.com/p/public-digital-collection/source/browse/#hg%2Furls">Google Code</a>.</p>

<p>If you have ideas for additional resources that should be included in the demo or a later
phase, please <a href="https://spreadsheets.google.com/spreadsheet/viewform?formkey=dE54cWtlN1V0bU1HMjZEV0N3d25na0E6MQ">let us know by suggesting a site.</a></p>

<p>Integrating mass digitized monograph collections such as the <a href="http://www.hathitrust.org/">Hathi Trust</a> was not investigated in this demo for both technical and content development 
reasons.  While we believe a 
crawler approach could be used to aggregate monographs, digitized monographs do not have the same granularity and characteristics as
web pages and short documents often found on the web. The out-of-the-box stack we are using is tuned for 
web crawling and search, and we anticipate book search may not be optimized by default. Furthermore, we believe that 
access to digitized local history and cultural history is an important role of libraries, and that this content should be made more visible.
</p>


   <!-- div><a href="https://code.google.com/p/public-digital-collection/source/browse/#hg%2Furls">list</a> Description</div>
   <div>Overlap with CLIR</div>
   <div>Other possible approaches to building a target list (more defined scope, etc.)</div -->
<h2>How could it be built out?</h2>
   <!-- p>Different use cases...
  Defined topics... User suggestions... -->
<p>There are several areas in which a vertical search could be expanded and refined:</p>
<ul>
<li><b>More content:</b> future phases of the project could include digitized monographs (see above) and contextual resources such as related Wikipedia entries (or portions therein).</li><br/>
<li><b>Better user experience:</b> programming effort could be expended to improve the relevancy ratings of search results and add facets (by developing plugins or filters for Nutch or Solr to add data 
to the 
records, embedding linked data  in content pages, and/or employing machine learning techniques).</li><br/>
<li><b>Connections with other projects:</b> an API for the index to the vertical search has been supplied to the CLIR/DLF DPLA beta project (which is exploring other interfaces for the IMLS Digital 
Collections 
and Content list), and this could be shared with other groups exploring similar themes for this project.</li>  
</ul>

<!-- <input type="submit" class="button" value="|◀"></input> -->

      </div>

      <div class="span-7 last">


<hr>



<div class="prepend-1 prepend-top vert-banner">
<a href="https://spreadsheets.google.com/spreadsheet/viewform?formkey=dE54cWtlN1V0bU1HMjZEV0N3d25na0E6MQ">suggest a resource</a>
</div>
      </div>
<hr>



<div class="footer">
<div><a href="http://www.cdlib.org/">CDL's</a> DPLA Vertical Search Demo |
<a href="http://code.google.com/p/public-digital-collection/">project code</a> |
<a href="http://public-digital-collection.googlecode.com/hg/pdc-mock.pdf">wireframes (PDF)</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.cdlib.org/about/privacy.html">Privacy Policy</a>
</div>
<div>
<a href="http://blogs.law.harvard.edu/dpla/">Digital Public Library of America Beta Sprint</a> --
<a href="http://cyber.law.harvard.edu/dpla/Beta_Sprint_Statements_of_Interest">Projects</a>
</div>
</div>
</div>
<script type="text/javascript" src="./x.js"/>
<!-- script>;</script -->
  </body>
</html>
